Date Source Site ID POC Daily Mean PM2.5 Concentration Units
1 01/05/2002 AQS 60010007 1 25.1 ug/m3 LC
2 01/06/2002 AQS 60010007 1 31.6 ug/m3 LC
3 01/08/2002 AQS 60010007 1 21.4 ug/m3 LC
4 01/11/2002 AQS 60010007 1 25.9 ug/m3 LC
5 01/14/2002 AQS 60010007 1 34.5 ug/m3 LC
6 01/17/2002 AQS 60010007 1 41.0 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
1 81 Livermore 1 100
2 93 Livermore 1 100
3 74 Livermore 1 100
4 82 Livermore 1 100
5 98 Livermore 1 100
6 115 Livermore 1 100
AQS Parameter Code AQS Parameter Description Method Code
1 88101 PM2.5 - Local Conditions 120
2 88101 PM2.5 - Local Conditions 120
3 88101 PM2.5 - Local Conditions 120
4 88101 PM2.5 - Local Conditions 120
5 88101 PM2.5 - Local Conditions 120
6 88101 PM2.5 - Local Conditions 120
Method Description CBSA Code
1 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
2 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
3 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
4 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
5 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
6 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
CBSA Name State FIPS Code State County FIPS Code
1 San Francisco-Oakland-Hayward, CA 6 California 1
2 San Francisco-Oakland-Hayward, CA 6 California 1
3 San Francisco-Oakland-Hayward, CA 6 California 1
4 San Francisco-Oakland-Hayward, CA 6 California 1
5 San Francisco-Oakland-Hayward, CA 6 California 1
6 San Francisco-Oakland-Hayward, CA 6 California 1
County Site Latitude Site Longitude
1 Alameda 37.68753 -121.7842
2 Alameda 37.68753 -121.7842
3 Alameda 37.68753 -121.7842
4 Alameda 37.68753 -121.7842
5 Alameda 37.68753 -121.7842
6 Alameda 37.68753 -121.7842
tail(old)
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
15971 12/10/2002 AQS 61131003 1 15 ug/m3 LC
15972 12/13/2002 AQS 61131003 1 15 ug/m3 LC
15973 12/22/2002 AQS 61131003 1 1 ug/m3 LC
15974 12/25/2002 AQS 61131003 1 23 ug/m3 LC
15975 12/28/2002 AQS 61131003 1 5 ug/m3 LC
15976 12/31/2002 AQS 61131003 1 6 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
15971 62 Woodland-Gibson Road 1 100
15972 62 Woodland-Gibson Road 1 100
15973 6 Woodland-Gibson Road 1 100
15974 77 Woodland-Gibson Road 1 100
15975 28 Woodland-Gibson Road 1 100
15976 33 Woodland-Gibson Road 1 100
AQS Parameter Code AQS Parameter Description Method Code
15971 88101 PM2.5 - Local Conditions 117
15972 88101 PM2.5 - Local Conditions 117
15973 88101 PM2.5 - Local Conditions 117
15974 88101 PM2.5 - Local Conditions 117
15975 88101 PM2.5 - Local Conditions 117
15976 88101 PM2.5 - Local Conditions 117
Method Description CBSA Code
15971 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15972 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15973 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15974 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15975 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15976 R & P Model 2000 PM2.5 Sampler w/WINS 40900
CBSA Name State FIPS Code State
15971 Sacramento--Roseville--Arden-Arcade, CA 6 California
15972 Sacramento--Roseville--Arden-Arcade, CA 6 California
15973 Sacramento--Roseville--Arden-Arcade, CA 6 California
15974 Sacramento--Roseville--Arden-Arcade, CA 6 California
15975 Sacramento--Roseville--Arden-Arcade, CA 6 California
15976 Sacramento--Roseville--Arden-Arcade, CA 6 California
County FIPS Code County Site Latitude Site Longitude
15971 113 Yolo 38.66121 -121.7327
15972 113 Yolo 38.66121 -121.7327
15973 113 Yolo 38.66121 -121.7327
15974 113 Yolo 38.66121 -121.7327
15975 113 Yolo 38.66121 -121.7327
15976 113 Yolo 38.66121 -121.7327
str(old)
'data.frame': 15976 obs. of 22 variables:
$ Date : chr "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
$ Source : chr "AQS" "AQS" "AQS" "AQS" ...
$ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
$ POC : int 1 1 1 1 1 1 1 1 1 1 ...
$ Daily Mean PM2.5 Concentration: num 25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
$ Units : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
$ Daily AQI Value : int 81 93 74 82 98 115 89 62 69 107 ...
$ Local Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
$ Daily Obs Count : int 1 1 1 1 1 1 1 1 1 1 ...
$ Percent Complete : num 100 100 100 100 100 100 100 100 100 100 ...
$ AQS Parameter Code : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
$ AQS Parameter Description : chr "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
$ Method Code : int 120 120 120 120 120 120 120 120 120 120 ...
$ Method Description : chr "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" ...
$ CBSA Code : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
$ CBSA Name : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
$ State FIPS Code : int 6 6 6 6 6 6 6 6 6 6 ...
$ State : chr "California" "California" "California" "California" ...
$ County FIPS Code : int 1 1 1 1 1 1 1 1 1 1 ...
$ County : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
$ Site Latitude : num 37.7 37.7 37.7 37.7 37.7 ...
$ Site Longitude : num -122 -122 -122 -122 -122 ...
summary(old)
Date Source Site ID POC
Length:15976 Length:15976 Min. :60010007 Min. :1.000
Class :character Class :character 1st Qu.:60290014 1st Qu.:1.000
Mode :character Mode :character Median :60590007 Median :1.000
Mean :60549600 Mean :1.581
3rd Qu.:60731002 3rd Qu.:1.000
Max. :61131003 Max. :6.000
Daily Mean PM2.5 Concentration Units Daily AQI Value
Min. : 0.00 Length:15976 Min. : 0.00
1st Qu.: 7.00 Class :character 1st Qu.: 39.00
Median : 12.00 Mode :character Median : 56.00
Mean : 16.12 Mean : 59.28
3rd Qu.: 20.50 3rd Qu.: 72.00
Max. :104.30 Max. :185.00
Local Site Name Daily Obs Count Percent Complete AQS Parameter Code
Length:15976 Min. :1 Min. :100 Min. :88101
Class :character 1st Qu.:1 1st Qu.:100 1st Qu.:88101
Mode :character Median :1 Median :100 Median :88101
Mean :1 Mean :100 Mean :88215
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88502
Max. :1 Max. :100 Max. :88502
AQS Parameter Description Method Code Method Description CBSA Code
Length:15976 Min. :117 Length:15976 Min. :12540
Class :character 1st Qu.:120 Class :character 1st Qu.:23420
Mode :character Median :120 Mode :character Median :40140
Mean :297 Mean :33270
3rd Qu.:707 3rd Qu.:41740
Max. :810 Max. :49700
NA's :929
CBSA Name State FIPS Code State County FIPS Code
Length:15976 Min. :6 Length:15976 Min. : 1.00
Class :character 1st Qu.:6 Class :character 1st Qu.: 29.00
Mode :character Median :6 Mode :character Median : 59.00
Mean :6 Mean : 54.78
3rd Qu.:6 3rd Qu.: 73.00
Max. :6 Max. :113.00
County Site Latitude Site Longitude
Length:15976 Min. :32.63 Min. :-124.2
Class :character 1st Qu.:34.07 1st Qu.:-121.4
Mode :character Median :35.36 Median :-119.1
Mean :36.00 Mean :-119.4
3rd Qu.:37.77 3rd Qu.:-117.9
Max. :41.71 Max. :-115.5
mean(is.na(old$`Daily Mean PM2.5 Concentration`))
[1] 0
summary(old$`Daily Mean PM2.5 Concentration`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 7.00 12.00 16.12 20.50 104.30
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
1 01/01/2022 AQS 60010007 3 12.7 ug/m3 LC
2 01/02/2022 AQS 60010007 3 13.9 ug/m3 LC
3 01/03/2022 AQS 60010007 3 7.1 ug/m3 LC
4 01/04/2022 AQS 60010007 3 3.7 ug/m3 LC
5 01/05/2022 AQS 60010007 3 4.2 ug/m3 LC
6 01/06/2022 AQS 60010007 3 3.8 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
1 58 Livermore 1 100
2 60 Livermore 1 100
3 39 Livermore 1 100
4 21 Livermore 1 100
5 23 Livermore 1 100
6 21 Livermore 1 100
AQS Parameter Code AQS Parameter Description Method Code
1 88101 PM2.5 - Local Conditions 170
2 88101 PM2.5 - Local Conditions 170
3 88101 PM2.5 - Local Conditions 170
4 88101 PM2.5 - Local Conditions 170
5 88101 PM2.5 - Local Conditions 170
6 88101 PM2.5 - Local Conditions 170
Method Description CBSA Code
1 Met One BAM-1020 Mass Monitor w/VSCC 41860
2 Met One BAM-1020 Mass Monitor w/VSCC 41860
3 Met One BAM-1020 Mass Monitor w/VSCC 41860
4 Met One BAM-1020 Mass Monitor w/VSCC 41860
5 Met One BAM-1020 Mass Monitor w/VSCC 41860
6 Met One BAM-1020 Mass Monitor w/VSCC 41860
CBSA Name State FIPS Code State County FIPS Code
1 San Francisco-Oakland-Hayward, CA 6 California 1
2 San Francisco-Oakland-Hayward, CA 6 California 1
3 San Francisco-Oakland-Hayward, CA 6 California 1
4 San Francisco-Oakland-Hayward, CA 6 California 1
5 San Francisco-Oakland-Hayward, CA 6 California 1
6 San Francisco-Oakland-Hayward, CA 6 California 1
County Site Latitude Site Longitude
1 Alameda 37.68753 -121.7842
2 Alameda 37.68753 -121.7842
3 Alameda 37.68753 -121.7842
4 Alameda 37.68753 -121.7842
5 Alameda 37.68753 -121.7842
6 Alameda 37.68753 -121.7842
tail(new)
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
59751 12/01/2022 AQS 61131003 1 3.4 ug/m3 LC
59752 12/07/2022 AQS 61131003 1 3.8 ug/m3 LC
59753 12/13/2022 AQS 61131003 1 6.0 ug/m3 LC
59754 12/19/2022 AQS 61131003 1 34.8 ug/m3 LC
59755 12/25/2022 AQS 61131003 1 23.2 ug/m3 LC
59756 12/31/2022 AQS 61131003 1 1.0 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
59751 19 Woodland-Gibson Road 1 100
59752 21 Woodland-Gibson Road 1 100
59753 33 Woodland-Gibson Road 1 100
59754 99 Woodland-Gibson Road 1 100
59755 77 Woodland-Gibson Road 1 100
59756 6 Woodland-Gibson Road 1 100
AQS Parameter Code AQS Parameter Description Method Code
59751 88101 PM2.5 - Local Conditions 145
59752 88101 PM2.5 - Local Conditions 145
59753 88101 PM2.5 - Local Conditions 145
59754 88101 PM2.5 - Local Conditions 145
59755 88101 PM2.5 - Local Conditions 145
59756 88101 PM2.5 - Local Conditions 145
Method Description CBSA Code
59751 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59752 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59753 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59754 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59755 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59756 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
CBSA Name State FIPS Code State
59751 Sacramento--Roseville--Arden-Arcade, CA 6 California
59752 Sacramento--Roseville--Arden-Arcade, CA 6 California
59753 Sacramento--Roseville--Arden-Arcade, CA 6 California
59754 Sacramento--Roseville--Arden-Arcade, CA 6 California
59755 Sacramento--Roseville--Arden-Arcade, CA 6 California
59756 Sacramento--Roseville--Arden-Arcade, CA 6 California
County FIPS Code County Site Latitude Site Longitude
59751 113 Yolo 38.66121 -121.7327
59752 113 Yolo 38.66121 -121.7327
59753 113 Yolo 38.66121 -121.7327
59754 113 Yolo 38.66121 -121.7327
59755 113 Yolo 38.66121 -121.7327
59756 113 Yolo 38.66121 -121.7327
str(new)
'data.frame': 59756 obs. of 22 variables:
$ Date : chr "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
$ Source : chr "AQS" "AQS" "AQS" "AQS" ...
$ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
$ POC : int 3 3 3 3 3 3 3 3 3 3 ...
$ Daily Mean PM2.5 Concentration: num 12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
$ Units : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
$ Daily AQI Value : int 58 60 39 21 23 21 13 38 59 55 ...
$ Local Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
$ Daily Obs Count : int 1 1 1 1 1 1 1 1 1 1 ...
$ Percent Complete : num 100 100 100 100 100 100 100 100 100 100 ...
$ AQS Parameter Code : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
$ AQS Parameter Description : chr "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
$ Method Code : int 170 170 170 170 170 170 170 170 170 170 ...
$ Method Description : chr "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" ...
$ CBSA Code : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
$ CBSA Name : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
$ State FIPS Code : int 6 6 6 6 6 6 6 6 6 6 ...
$ State : chr "California" "California" "California" "California" ...
$ County FIPS Code : int 1 1 1 1 1 1 1 1 1 1 ...
$ County : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
$ Site Latitude : num 37.7 37.7 37.7 37.7 37.7 ...
$ Site Longitude : num -122 -122 -122 -122 -122 ...
summary(new)
Date Source Site ID POC
Length:59756 Length:59756 Min. :60010007 Min. : 1.00
Class :character Class :character 1st Qu.:60290019 1st Qu.: 1.00
Mode :character Mode :character Median :60631006 Median : 3.00
Mean :60563315 Mean : 3.77
3rd Qu.:60731026 3rd Qu.: 3.00
Max. :61131003 Max. :24.00
Daily Mean PM2.5 Concentration Units Daily AQI Value
Min. : -6.700 Length:59756 Min. : 0.00
1st Qu.: 4.100 Class :character 1st Qu.: 23.00
Median : 6.800 Mode :character Median : 38.00
Mean : 8.428 Mean : 39.28
3rd Qu.: 10.700 3rd Qu.: 54.00
Max. :302.500 Max. :454.00
Local Site Name Daily Obs Count Percent Complete AQS Parameter Code
Length:59756 Min. :1 Min. :100 Min. :88101
Class :character 1st Qu.:1 1st Qu.:100 1st Qu.:88101
Mode :character Median :1 Median :100 Median :88101
Mean :1 Mean :100 Mean :88192
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88101
Max. :1 Max. :100 Max. :88502
AQS Parameter Description Method Code Method Description CBSA Code
Length:59756 Min. :143 Length:59756 Min. :12540
Class :character 1st Qu.:170 Class :character 1st Qu.:31080
Mode :character Median :170 Mode :character Median :40140
Mean :336 Mean :34957
3rd Qu.:707 3rd Qu.:41860
Max. :810 Max. :49700
NA's :4567
CBSA Name State FIPS Code State County FIPS Code
Length:59756 Min. :6 Length:59756 Min. : 1.00
Class :character 1st Qu.:6 Class :character 1st Qu.: 29.00
Mode :character Median :6 Mode :character Median : 63.00
Mean :6 Mean : 56.19
3rd Qu.:6 3rd Qu.: 73.00
Max. :6 Max. :113.00
County Site Latitude Site Longitude
Length:59756 Min. :32.58 Min. :-124.2
Class :character 1st Qu.:34.07 1st Qu.:-121.4
Mode :character Median :36.49 Median :-119.6
Mean :36.24 Mean :-119.6
3rd Qu.:37.96 3rd Qu.:-117.9
Max. :41.76 Max. :-115.5
mean(is.na(new$`Daily Mean PM2.5 Concentration`))
[1] 0
summary(new$`Daily Mean PM2.5 Concentration`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.700 4.100 6.800 8.428 10.700 302.500
#finding total number of negative daily mean PM2.5 valueslength(new[new$`Daily Mean PM2.5 Concentration`<0,'Daily Mean PM2.5 Concentration'])
[1] 215
#215 values
2002 data summary:
For the 2002 dataset, the dimensions are 15,976 rows (observations) by 22 columns (variables).
No apparent data issues.
2022 data summary:
For the 2022 dataset, the dimensions are 59,756 rows (observations) by 22 columns (variables).
The daily mean PM2.5 concentration variable seems to have a negative minimum value of -6.7, which doesn’t make sense. There are a total of 215 observations that have a negative daily mean PM 2.5 concentration value.
Both 2002 and 2022 dataset findings:
Both datasets have three types of variables: character, integer, and numeric.
Character variable names: date, source, units, local site name, AQS parameter description, method description, CBSA name, state, county
Integer variable names: site ID, POC, daily AQI value, daily obs count, AQS parameter code, method code. CBSA code, state FIPS code, county FIPS code
Numerical variable names: daily mean PM2.5 concentration, percent complete, site latitude, site longitude
Step 2
#combining two years into one data frameboth <-rbind(old, new)dim(both)
[1] 75732 22
#creating new column for yearboth$Year <-format(as.Date(both$Date, format="%m/%d/%Y"),"%Y")#changing names of key variablesnames(both)[names(both) =="Daily Mean PM2.5 Concentration"] <-"pm2.5mean"names(both)[names(both) =="Site Latitude"] <-"lat"names(both)[names(both) =="Site Longitude"] <-"lon"
Step 3
library(leaflet)old2 <- both[both$Year ==2002, ]new2 <- both[both$Year ==2022, ]#one map with both yearsleaflet() %>%addProviderTiles('OpenStreetMap') %>%addCircles(data = old2,lat=~lat,lng=~lon, popup ="2002",opacity=1, fillOpacity=1, radius=100, color ="blue") %>%addCircles(data = new2,lat=~lat,lng=~lon, popup ="2022",opacity=1, fillOpacity=1, radius=100, color ="red")
#separate map for 2002leaflet() %>%addProviderTiles('OpenStreetMap') %>%addCircles(data = old2,lat=~lat,lng=~lon, popup ="2002",opacity=1, fillOpacity=1, radius=100, color ="blue")
#separate map for 2022leaflet() %>%addProviderTiles('OpenStreetMap') %>%addCircles(data = new2,lat=~lat,lng=~lon, popup ="2022",opacity=1, fillOpacity=1, radius=100, color ="red")
Summary of spatial distribution:
The leaflet maps indicate that there are more data points for 2022 (red) than for 2002 (blue). For both 2002 and 2022, the data points are distributed throughout California, with clusters around Los Angeles and the Bay Area. Compared to the 2022 data, the 2002 data points are sparser - especially in the central area.
#Summarize the spatial distribution of the monitoring sites??
#might help to make two maps, one for each year because a lot of the stations haven’t moved
Step 4
sum(is.na(both$pm2.5mean))
[1] 0
#there are 0 missing values of PM 2.5both <- both[!is.na(both$pm2.5mean), ]both <- both[order(both$pm2.5mean), ]head(both)
Date Source Site ID POC pm2.5mean Units Daily AQI Value
42912 09/20/2022 AQS 60571001 5 -6.7 ug/m3 LC 0
42911 09/19/2022 AQS 60571001 5 -6.3 ug/m3 LC 0
42913 09/21/2022 AQS 60571001 5 -5.1 ug/m3 LC 0
42896 09/03/2022 AQS 60571001 5 -4.7 ug/m3 LC 0
42914 09/22/2022 AQS 60571001 5 -4.7 ug/m3 LC 0
42897 09/04/2022 AQS 60571001 5 -4.1 ug/m3 LC 0
Local Site Name Daily Obs Count Percent Complete AQS Parameter Code
42912 Truckee-Fire Station 1 100 88502
42911 Truckee-Fire Station 1 100 88502
42913 Truckee-Fire Station 1 100 88502
42896 Truckee-Fire Station 1 100 88502
42914 Truckee-Fire Station 1 100 88502
42897 Truckee-Fire Station 1 100 88502
AQS Parameter Description Method Code
42912 Acceptable PM2.5 AQI & Speciation Mass 733
42911 Acceptable PM2.5 AQI & Speciation Mass 733
42913 Acceptable PM2.5 AQI & Speciation Mass 733
42896 Acceptable PM2.5 AQI & Speciation Mass 733
42914 Acceptable PM2.5 AQI & Speciation Mass 733
42897 Acceptable PM2.5 AQI & Speciation Mass 733
Method Description CBSA Code CBSA Name
42912 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
42911 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
42913 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
42896 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
42914 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
42897 Met-One BAM W/PM2.5 VSCC 46020 Truckee-Grass Valley, CA
State FIPS Code State County FIPS Code County lat lon
42912 6 California 57 Nevada 39.32783 -120.1846
42911 6 California 57 Nevada 39.32783 -120.1846
42913 6 California 57 Nevada 39.32783 -120.1846
42896 6 California 57 Nevada 39.32783 -120.1846
42914 6 California 57 Nevada 39.32783 -120.1846
42897 6 California 57 Nevada 39.32783 -120.1846
Year
42912 2022
42911 2022
42913 2022
42896 2022
42914 2022
42897 2022
summary(both$pm2.5mean)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.70 4.50 7.60 10.05 12.20 302.50
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
#dataset with only negative mean PM2.5 valuesneg <- both[both$pm2.5mean <0, ]summary(neg)
Date Source Site ID POC
Length:215 Length:215 Min. :60010011 Min. :1.00
Class :character Class :character 1st Qu.:60292009 1st Qu.:3.00
Mode :character Mode :character Median :60651016 Median :3.00
Mean :60614750 Mean :2.67
3rd Qu.:60831008 3rd Qu.:3.00
Max. :61130004 Max. :5.00
pm2.5mean Units Daily AQI Value Local Site Name
Min. :-6.700 Length:215 Min. :0 Length:215
1st Qu.:-0.800 Class :character 1st Qu.:0 Class :character
Median :-0.400 Mode :character Median :0 Mode :character
Mean :-0.707 Mean :0
3rd Qu.:-0.200 3rd Qu.:0
Max. :-0.100 Max. :0
Daily Obs Count Percent Complete AQS Parameter Code AQS Parameter Description
Min. :1 Min. :100 Min. :88101 Length:215
1st Qu.:1 1st Qu.:100 1st Qu.:88101 Class :character
Median :1 Median :100 Median :88101 Mode :character
Mean :1 Mean :100 Mean :88252
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88502
Max. :1 Max. :100 Max. :88502
Method Code Method Description CBSA Code CBSA Name
Min. :170.0 Length:215 Min. :12540 Length:215
1st Qu.:170.0 Class :character 1st Qu.:37100 Class :character
Median :170.0 Mode :character Median :40900 Mode :character
Mean :371.2 Mean :36160
3rd Qu.:731.0 3rd Qu.:42100
Max. :733.0 Max. :47300
NA's :19
State FIPS Code State County FIPS Code County
Min. :6 Length:215 Min. : 1.00 Length:215
1st Qu.:6 Class :character 1st Qu.: 29.00 Class :character
Median :6 Mode :character Median : 65.00 Mode :character
Mean :6 Mean : 61.33
3rd Qu.:6 3rd Qu.: 83.00
Max. :6 Max. :113.00
lat lon Year
Min. :32.84 Min. :-124.2 Length:215
1st Qu.:34.84 1st Qu.:-122.0 Class :character
Median :37.06 Median :-121.1 Mode :character
Mean :37.04 Mean :-120.5
3rd Qu.:38.94 3rd Qu.:-118.9
Max. :41.76 Max. :-115.5
library(ggplot2)#exploring proportion of neg mean PM 2.5 valuesneg |>ggplot() +geom_bar(mapping=aes(x=pm2.5mean, y=stat(prop)))
Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(prop)` instead.
#shows widest boxplot comes from September 2022#want to see if they came from a single day or were evenly distributed across timeframe
There are no missing values for the mean PM 2.5 concentration in the combined dataset.
There are 215 implausible values in the combined dataset. They are negative values, which is implausible since it doesn’t make sense for the mean PM 2.5 concentration to be negative.
Plotting the proportions of the 215 negative/implausible values via barplot shows that most of the values are between -2 and 0. The barplot of negative PM 2.5 mean values shows a left-skewed distribution. Plotting the 215 negative values via boxplot shows that the widest range (of boxplot) comes from September 2022.
Step 5
library(ggplot2)#state line plotplot(both$Year, both$pm2.5mean, col =factor(both$State))
#plot(old2$pm2.5mean, col = factor(old2$State))old_hist_state <-hist(old2$pm2.5mean, col =factor(old2$State))
new_hist_state <-hist(new2$pm2.5mean, col =factor(new2$State))
#plot_grid#state geom line plotboth[!is.na(pm2.5mean)] |>ggplot(data=both, mapping=aes(x=Year, y=pm2.5mean, color=State)) +geom_point() +geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#boxplot(both$pm2.5mean ~ both$Year, col=factor(both$State))#take average at the county level. taking average within groups and then put into barplots#do I remove implausible values or keep them?? –> remove them
State:
Compared to the data from 2002, the 2022 data points have a narrower interquartile range (IQR) and lower median. However, the 2022 data points have a wider overall range with a much higher maximum value (around 300) and lower minimum value (below 0 -> the implausible values). At the state level, the data shows that the daily concentrations of PM 2.5 may have decreased in California over the last 20 years (from 2002 to 2022) but there are more outliers and there is a wider range in 2022.
County:
Compared to the data from 2002, the 2022 data points seem to have lower interquartile ranges (IQR) overall. However, the 2022 data points have a wider overall range with much higher maximum values (around 300) and lower minimum values (below 0 -> the implausible values). At the county level, the data shows that the daily concentrations of PM 2.5 may have decreased in California over the last 20 years (from 2002 to 2022) but there are more outliers and there is a wider range in 2022.
Site in Los Angeles:
Compared to the data from 2002, the 2022 data points have a narrower interquartile range (IQR) and lower median. Also, the 2002 data points have a wider overall range with a much higher maximum value (around 80) and lower minimum value (around 0). For sites in Los Angeles, the data shows that the daily concentrations of PM 2.5 have decreased in California over the last 20 years (from 2002 to 2022).